Method for object recognition

ABSTRACT

The present disclosure proposes a computer implemented of object recognition of an object to be identified using a method for reconstruction of a 3D point cloud. The method comprises the steps of acquiring, by a mobile device, a plurality of pictures of said object, sending the acquired pictures to a cloud server, reconstructing, by the cloud server, a 3D points cloud reconstruction of the object, performing a 3D match search in a 3D database using the 3D points cloud reconstruction, to identify the object, the 3D match search comprising a comparison of the reconstructed 3D points cloud of the object with 3D points clouds of known objects stored in the 3D database.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.17/112,419, filed Dec. 4, 2020, entitled “SYSTEMS AND METHODS FORPERFORMING A 3D MATCH SEARCH IN A 3D DATABASE BASED ON 3D PRIMITIVES ANDA CONNECTIVITY GRAPH”, which is a continuation of U.S. patentapplication Ser. No. 16/652,927, filed Apr. 1, 2020, entitled “METHODFOR 3D OBJECT RECOGNITION BASED ON 3D PRIMITIVES”, which is a nationalstage application filed under 35 U.S.C. § 371 of International PatentApplication No. PCT/CA2018/000186, filed Oct. 5, 2018, which claims thebenefit of Luxembourg Patent Application No. LU100465, filed Oct. 5,2017, each of which is incorporated by reference herein in its entirety.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

None

BACKGROUND OF THE INVENTION

Searching, identifying, connecting an object to the network is one ofthe major issues of years to come. As we can see the World Wide Webbecoming more and more mobile, this recognition processes and techniqueshave to be adapted to mobile users and thus mobile devices. Severaltechniques have yet been developed to do so, such as 2D pictureanalysis, optical character recognition (O.C.R.), QR-Codes or Bar-Codes,geolocation, color recognition. They prove very useful and efficient inparticular cases, like O.C.R. for books, geolocation for monuments orQR-codes when present, but lack of efficiency in most cases. Indeed,objects in today's life are mainly defined in 3D and 3D parameters haveto be taken in consideration to recognize them. Those parameters includepeaks, tops, edges, shapes, reliefs.

An object of the invention is to propose a method for recognition of anobject using 3D parameters, yet without scanning the object in 3D. Withother words, the present invention's purpose is not to develop a 3Dscanner but to use three dimensional parameters to recognize objects.

BRIEF SUMMARY OF THE INVENTION

Therefore, the present invention proposes a computer implemented methodof object recognition of an object to be identified, the methodcomprising the steps of acquiring, by a mobile device, a plurality ofpictures of said object, sending the acquired pictures to a cloudserver, reconstructing, by the cloud server, a 3D points cloudreconstruction of the object, and performing a 3D match search in a 3Ddatabase using the 3D points cloud reconstruction, to identify theobject, the 3D match search comprising a comparison of the 3D pointscloud reconstruction of the object with 3D points clouds of knownobjects stored in the 3D database. A 3D point cloud is a data set, sothe method comprises comparing the reconstructed 3D data set (3D pointscloud reconstruction) with known 3D data sets of known objects.

In an aspect, the comparison of the 3D points cloud reconstruction ofthe object with 3D points clouds of known objects stored in the 3Ddatabase includes at least one of machine learning or 3D geometriccomparison. In this aspect, the present invention therefore proposes amethod of recognition using 3D points cloud in two noticeable ways:geometrical matching/deep matching on the one hand, and 3D machinelearning on the other hand. To achieve these goals, 3D reconstruction ofobjects is required. Those 3D reconstructed models may be analyzedextracting specific parameters used for further recognition. Those 3D“sub-parameters” feed the recognition pipeline in its two branches(geometrical and machine learning)

In an aspect, the machine learning comprises the step of splitting the3D point cloud reconstruction into a plurality of 3D descriptors,wherein the 3D descriptors include ones of planes, spheres, cylinders,cones, cubes, and torus, and wherein the 3D descriptors are split into aplurality of 3D primitives associated to the 3D descriptors, and whereinthe plurality of 3D primitives are spatially connected throughconnectivity graphs describing their spatial connectivity forming theobject. The 3D search match may be performed using the extractedplurality of primitives and associated connectivity graph. 3Ddescriptors and geometrical “primitives” from the 3D reconstructedmodels can be derived, whereby the descriptors are “simple objects”,also called “primitives”, such as planes, spheres, cylinders, cones,cubes or tori. In a reverse process, any 3D object can be separated in acollection of those elementary shapes. Those elementary shapes are thenspatially connected to each other through graphs than describe theirspatial connectivity to form the whole object. The combination ofmatching small objects (primitives) with their connectivity graphs is atool for an efficient matching.

In yet another aspect, the method comprises performing a first searchmatch in a first database in which known objects are stored with knownmetadata associated with the known objects, the first search match beingperformed using 2D recognition techniques including at least one ofOptical Character Recognition, SIFT based imaging, color gradientanalysis, and/or the first search match being performed on the metadata

The steps of the method may be performed concurrently, until the objecthas been identified in at least one of the first database or of the 3Ddatabase, in particular pictures are acquired as long as the object hasnot been acquired or until a time out has been reached, wherein databaseindexation of the 3D database and/or of the first database is updatedeach time a known object stored in the 3D database or in the firstdatabase is eliminated, in particular using metadata or bounding boxesrepresentative of the dimensions of the object to be identified.

In an aspect, the step of acquiring a plurality of pictures comprisesextracting said pictures from a video sequence, the method comprisingdynamically adjusting the acquisition parameter depending on the 3Dpoints cloud reconstruction, wherein pictures from the video sequenceare saved every «n» frame, and wherein «n» being adjusted dynamically orby the user, in particular wherein n is given a higher value at thestart of the method and decreases as the reconstruction becomes moreaccurate.

In another aspect, the step of reconstructing a 3D points cloudreconstruction of the object comprises extracting a plurality of keypoints that can be correlated in said plurality of pictures of theobject, wherein at least two pictures of the plurality of pictures showat least two different viewpoints of the object, placing the key pointson the object, defining a plurality of vertices of the object, wherein avertex corresponds in 3D to a specific points identified in at least 3pictures of the object, and adding the 3D vertices to build areconstructed 3D points cloud of the object, to derive the 3D pointscloud reconstruction of the object.

In this aspect, the present method is adapted to identify and treatessential 3D parameters extracted from the 3D reconstruction as keypoints of an object, such as peaks, tops, edges, shapes, reliefs, aswell as its texture, colors, materials . . . .

The 3D reconstruction may include a step of denoising the 3D pointscloud reconstruction, wherein the denoising includes sampling the 3Dreconstructed space with a plurality of virtual voxels, counting thenumber of vertices contained in a virtual voxel, and deleting saidvertices containing in the virtual voxel when the number of saidvertices in the virtual voxel is below a vertex threshold. A size of thevirtual voxel and the vertex threshold may be dynamically adjustable.

In yet another aspect, the step of reconstructing a 3D point cloudreconstruction of the object comprises extracting a plurality of keypoints that can be correlated in said plurality of pictures of theobject, wherein at least two pictures of the plurality of pictures showat least two different viewpoints of the object, placing the key pointson the object, defining a plurality of 3D slices of the object, whereina 3D slice comprises at least one key point, and adding the 3D slices tobuild a reconstructed 3D points cloud of the object, to derive the 3Dpoints cloud reconstruction of the object.

The method may comprise computing a calibration matrix in a referenceframe to derive a relative measurement system, wherein the 3D slices areadded in the obtained relative measurement system.

In an aspect, the method comprises defining an initial set of searchablepoints in a first picture of the plurality of pictures and identifyingsome of the searchable points in the remaining pictures of the pluralityof pictures, to extract the key points.

Displaying in real time information pertaining the method on the mobiledevice may be performed, wherein the mobile device may comprise an inputdevice allowing a user to enter input data concerning the object to beidentified, and wherein the first match search or the 3D match searchare adapted depending on the input data.

The present disclosure has needs to reconstruct objects in 3D toextracts specific features for further recognition has led the inventorsto combine and connect techniques of 2D tracking with 3D reconstruction.Techniques of “deep matching” that are working at a sub-pixel scale areused to find 2D correspondences between 2D pictures at the pixel levelas SFM algorithms work on numerous neighboring pixels in an area definedby a radius around a central pixel.

The general operating of the method of object recognition of an objectof this disclosure is to observe an object with a device from as manyangles as possible. The information acquired by the device is distantlycomputed and compared to information contained in an object database. Assoon as a match is found, the object from the database is displayed. Itis important to note that the object that is further used oncerecognized is the object from the database, not the one that has beencaptured.

The 3D approach of the present disclosure gives the opportunity to use2D recognition techniques in all the view angles of an object, thusallowing watching and analyzing the object on all their sides andpicking up every detail that will help to recognize the object. Unlikemost approaches that are aiming to fully and densely reconstructcaptured objects (3D scanning, facial recognition, printable 3D objectsand formats), the present application uses calculated 3D parameters as aunique signature for an object. This is achieved using points cloudstechniques which allow fast (within seconds) and efficient 3Drepresentation of captured objects but also accurate comparison with anexisting 3D database. Open source “Point Clouds Libraries (PCL)” andmore recent “Geometry Factory Librairies” can be used for developing thesoftware.

Should the object have an existing 3D representation, thisrepresentation can be displayed to the user in order to have a 3Dinteractive representation of the object; If this 3D pre modeled objectbe available, it could be printed through a 3D printer, . . . .

In the present application the term “object” is used to designateanything that can be captured by the device. It can be any object;natural, artificial, articulated, soft, hard . . . as long as apicture/video can be shot or taken to represent said object.

Still other aspects, features, and advantages of the present inventionare readily apparent from the following detailed description, simply byillustrating a preferable embodiments and implementations. The presentinvention is also capable of other and different embodiments and itsseveral details can be modified in various obvious respects, all withoutdeparting from the spirit and scope of the present invention.Accordingly, the drawings and descriptions are to be regarded asillustrative in nature, and not as restrictive. Additional objects andadvantages of the invention will be set forth in part in the descriptionwhich follows and in part will be obvious from the description, or maybe learned by practice of the invention.

DESCRIPTION OF THE DRAWINGS

The invention is described hereinafter with reference to the encloseddrawings, in which:

FIG. 1 is an overview of a system for object recognition in one aspectof the disclosure.

FIG. 2 is an overview of a method for object recognition according toone aspect of the disclosure.

FIGS. 3A-3E are examples of picture date at different stages of themethod of FIG. 1 .

FIG. 4 showing a level of 3D detail depending of the input number ofpictures used in the method of FIG. 1 .

FIG. 5 shows examples of an object and its 3D reconstruction accordingto one aspect of the disclosure.

FIG. 6 is an overview of a method for generating random pictures for 2Dpicture matching used for object recognition according to one aspect ofthe disclosure.

FIG. 7 shows a representation of space which can be used in method ofobject recognition according to one aspect of the disclosure.

FIG. 8 shows a method of segmentation which can be used in a method ofobject recognition according to one aspect of the disclosure.

FIG. 9 shows an example of compression usable in a method according toone aspect of the present disclosure.

FIGS. 10 and 11 are overview of a method for object recognition inanother aspect of the disclosure.

FIG. 12 is an overview of system for object recognition according to oneaspect of the disclosure.

FIGS. 13A-3E are examples of picture data at different stages of themethod of FIG. 1 .

FIG. 14 is an example of SFM matching technique as known in the art.

FIG. 15 is an example of deep matching technique as known in the art.

FIG. 16 is an overview of a method for combining SFM matching techniquesand deep matching techniques in an aspect of the disclosure.

FIG. 17 is an example of volume computation in one aspect of thedisclosure.

FIG. 18 gives an example of decimation of a database doing objectrecognition in an aspect of the disclosure.

FIG. 19A-19E is an example of different steps of the method of objectrecognition in one aspect of the disclosure.

FIG. 20A-20C shows denoising and cleaning usable in a method of FIG.19A-19E in an aspect of the disclosure.

FIGS. 21 and 22 illustrates descriptor and primitives which can be usedin a method according to the invention.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 2 is an overview of the system 1 for object recognition of anobject 50. The system 1 comprises a device 10, which is used by an enduser. The device 10 preferably comprises a display screen, camera andvideo camera, embedded CPU, storage capacities, a connection to anetwork. The device 10 may have connections to existing platforms 2,such as M-commerce, 3D printing, CRM, social networks.

For example, the device 10 can be a smartphone, a tablet, a laptop withweb cam, a computer, or the like. As will be understood by the skilledperson, smartphones and tablets are most efficient devices for themethod of the present invention.

The device 10 is connected to a cloud server 20. The cloud server 20comprises distant CPUs or GPUs facilities. The cloud server 20 can bemainly be provided by recognized actors in the domain, such as Azure(Microsoft), AWS (Amazon), Cisco, Google, HP, or more specialized cloudcomputing provider, as long as the providers offer efficiency, securityand a worldwide presence. In one aspect of the disclosure, power andefficiency of the cloud server 20 can be adapted to the amount ofcalculation to be processed.

The cloud server 20 has a connection to a storage server 30. The storageserver 30 is a distance storage involving both objects and user data, aswill be explained in the present disclosure. The storage server 30comprises a first database 35 and a second database 38. The firstdatabase 35 comprises stored images in a 2D dimension. The seconddatabase 38 comprises stored 3D files of images.

FIG. 2 is an overview of a method for object 50 recognition according toone aspect of the invention, and described with reference to the systemshown on FIG. 1 .

The method comprises the step of acquiring a plurality of pictures 1000of an object 50 (node 1.3). In one aspect of the invention, pictures canbe acquired or captured by the device 10. Two different acquisitionmodes can be used: extraction from video or burst mode. In the burstmode, pictures are taken in photographic mode as quick as the device 10allows it. The skilled person will understand that the acquisition ofpictures using extraction from video is more automatic but also morespace and CPU consuming. Pictures from a video sequence shot by thevideo camera of the user's device can be saved every «n» frame, «n»being adjusted dynamically or by the user, representing in some way the«quality» of the 3D scanning. For example, if n<10 frames, the qualityis better but the process is slower. On the other hand, if n>50 frames,the scanning is of lower quality but the process faster. In an aspect ofthe disclosure, «n» is dynamically adjusted, starting with a high value(+/−50 frames) and decreasing as the reconstruction becomes moreaccurate (+/−10 frames).

The burst mode is more “clever” and should require some fine computingto select “proper” frames, i.e. frames that are useful for a moreaccurate 3D “cloud points” reconstruction. Examples of pictures 1000 areshown on FIG. 3A.

It should be noted that objects should preferably been shot from variousangles. In case of big objects or objects that cannot be turned aroundlike monuments, only specific angles can be used. The different pictures1000 for the object 50 represents different views from the object, fromdifferent viewpoints.

The pictures 1000 are acquired until a full set of pictures is acquired.

A set of pictures may be considered completed after a certainacquisition time. For example, an acquisition time estimated to 10-15seconds might be enough. Preferably, an overall time out can be set toavoid infinite looping in the process.

In one aspect of the disclosure, the length of the acquisition time isdynamic and may be adapted depending on a 3D points cloudreconstruction, as will be explained later in this disclosure withreference to nodes 2.3 to 2.9.

The device 10 sends the acquired plurality of pictures 1000 to the cloudserver 20 for cloud computing (node 2.1). As will be explained in thefollowing, a first 2D search match in the databank 35 and/or a 3D cloudreconstruction followed by a second 3D search match is performed.

The cloud server 20 forwards the plurality of pictures 1000 to thestorage server 30.

A first search match in the database 35 may be performed in order tomatch the acquired pictures with a known image 1005 stored in the firstdatabase 35 (node 3.1).

The first search match is based on 2D image recognition techniques.Those 2D recognition techniques are implemented in the matchingalgorithm. Different 2D recognition techniques can be implemented, suchas open source techniques. The 2D recognition technique include at leastone of O.C.R (node 3.1.1), Scale Invariant Feature Transform—SIFT basedimage matching (i.e. automatic recognition of key elements in a picture)(node 3.1.2), color gradient analysis (node 3.1.3) giving a precisecolor map of the object 50. Geolocation information (node 3.1.4) may beused as well.

Each time a non-fitting object stored in the database 35 is eliminatedby either one of these techniques, database indexation is updated inorder to ease the overall process.

Node 3.1.5 and fi. 6 describe an original approach referred to as “2Dreverse projections from 3D objects”. The 3D stored models of objects or3D files of image in the database 35 are provided with complete metadatadescribing the object. For example, the metadata comprises the followingidentification data: name, brand, description, size, 2D parameters(colors gradients or maps, o.c.r. data, histograms, FourierTransformations, samplings . . . ), 3D parameters (points cloudrepresentations, triangulation, textures, materials, size, intrinsicdimensions . . . ). Among these parameters, it is assumed that the 3Drepresentation of the objects generates numerous “random” 2D pictures,for an object. This “in house” piece of code generates a plurality of 2Dpictures 2000 rendered from the 3D stored model, in order to simulate asmany users' captures as possible. This includes different randomlightings, different random points of view, different random exposures .. . and thus simulate a user's capture. This 2D pictures generationsends back, in 2D pictures, comparison through Hausdorff distance orKullback-Leibler distance for example.

Therefore, the 2D reverse projections from 3D object is adapted tosimulate the capture of the object 50 by the user and to propose as many“artificial” pictures 2000 as possible to compare them to the pictureset 1000 of the object 50 sent by the user. 2D comparison of theartificial pictures 2000 and of the acquired pictures 1000 is processedalong the other techniques for final matching.

If the first search match is successful, the storage server 30 sendsitem 1010 of relevant information belonging to the known image 1005 backto the cloud server 20 which forwards said item 1010 to the device 10for display.

The item 1010 may comprise identification information of the object, apicture of the object, a localization information of the object, and thelike.

If the first search match is not successful, the storage server 30returns the information to the cloud server 20 that the first searchmatch was not successful (node 3.12). The cloud server 20 starts a 3Dreconstruction process in order to obtain a cloud of 3D points from the2D picture set, followed by a followed by a 3D search match.

This 3D reconstruction process, done by cloud server 20, is shown innodes 2.3 to 2.9. As will be detailed below, the 3D reconstructionprocess includes an identification of the 2D pictures set, a 2D trackingin the pictures, a 3D points set placement and 3D points cloudreconstruction. openMVG libraries may be used or any libraries known tothe skilled person.

The pictures 1000 are analyzed at node 2.3 to identify the pictures 1000and extract identification information pertaining to the picture set,such as the acquisition mode, the length of frame acquisition, theidentification of frames for the 3D cloud reconstruction.

Using this identification information, a key point extraction process islaunched on the pictures 1000, in order in order to extract key points1030. Key points are defined as being points that can be correlated inas many pictures as possible.

The key points 1030 are identified by a 2D tracking process throughoutall the pictures 1000 of the set of pictures, in which each point from apicture is identified in other pictures. If the pictures were acquiredthough a video, pictures corresponds to frames of the video. In otherwords, an initial set of searchable points 1032 is defined in a firstpicture, and the 2D tracking process tries to identify the searchablepoints 1032 in the other pictures of the set to extract the key points1030. This is shown on FIGS. 3B and 3C.

The searchable points are refined throughout the process. Points areadded, other are suppressed. During the key point extraction process,the set of key points is compared to the initial set of searchablepoints. Should the number of key points be too low, other searchablepoints would have to be added to the initial set of searchable points inorder to be tracked again. There is no really minimum in the number ofkey points to be tracked, but the 3D reconstruction and the followingcomparison process is more efficient with dozens of points, asillustrated on FIG. 4 showing a level of 3D detail depending of theinput number of pictures.

The skilled person will further understand that it is important that theobject 50 has to be motionless while being captured, to allow asuccessful key point extraction process.

In order to reconstruct 3D points cloud, the “virtual world” ispreferably calibrated to obtain a relative measurement system Indeed,the system will generally not be able to calculate absolute dimensionsfrom the pictures set unless there is in at least one picture a distancereference, i.e. an object 50 which size/dimensions are known. Most ofthe time, this will not be the case. However, the object 50 will havecoherent dimensions although it will not have the right size andproportions. For example, should the end user scan a mug, the systemwill recognize that the object 50 is a mug but won't be able todetermine if this is a regular mug or the same mug in a giant versionthat could be exposed in front of a store as an advertising totem.Nevertheless, the system will send back an “OK” to the user, consideringthe mug is a regular one that can be found, bought, shared.

This calibration is made using triangulation algorithms. If two or morecameras whose positioning is well known in space see a specific point,triangulation based on elementary trigonometric formulas can determinethe exact position of this specific point in space (i.e. in 3dimensions). In the reverse process, if one tracked point is seen fromdifferent viewpoints (even though these different viewpoints are givenby one single moving device), these different viewpoints can bepositioned in a 3D space relatively to the tracked points, and thus thecaptured object.

The calibration is done at node 2.5, in which a camera calibration isdone using matching 2D points, as explained below.

In order to reconstruct 3D points clouds from sets of pictures, the 2Dpictures should be replaced in a 3D environment, by providing theanswers to the following questions: where in space are the picturestaken from, and where in space are located the 2D tracked points.

The geometrical system at the time of capture can be represented on FIG.6 .

The device 10 is represented here through its optical center O and hisfocal plane (“image plane”). The image of the object 50 is made ofnumerous points P(X,Y,Z). The correspondence between the camera “C” andthe object “P” is given by the following formula: Pc=CP, where Pc is theprojection of P on the image plane, C the complete camera calibrationmatrix. The calibration matrix C is related to the device 10 and remainsthe same for a whole capture session. For example, C can be a 3×4 matrix(12 unknowns).

The method for reconstruction is thus to calculate the calibrationmatrix C (calibrating the camera) in a reference frame and then to applythe transformation to other frames in order to position as many P pointsas possible in the space. It should be noted the object P has 3coordinates and is thus positioned in a 3D space.

The calibration matrix C is calculated knowing a few correspondencesbetween 3D points and their 2D projections on the camera image plane. 2Dprojections coordinates are known in the image plane, while 3Dcoordinates are also known in an arbitrary 3D space (i.e. P could beconsidered for example as the center of the 3D world). Pc=CP provides 2equations containing 12 unknowns, meaning that at least 6correspondences must be known in order to solve C. Those correspondencesare determined using fiducial based image processing methods.

Once the calibration matrix C is known, a point Q in space can be foundthrough the reverse equation Q=C−1Qc, where C and Qc are known. Q has 3coordinates that are 3 unknowns. It thus requires another point of viewwith the same camera to solve the system and position Q in the 3D space.

These calculations are made without any indication of the realdimensions in space. The reconstructed objects have the right geometrybut there is no indication about their sizes unless there is in thecamera field of view another object whose dimension is well known. Thisis, however, not prerequisite for the present disclosure.

Computing tools on geometry and trigonometry can be found in opensources libraries (like openCV), libraries that are available in opensource since June 2000. Those libraries provide numerous tools ondigital pictures analysis, such as automatic 3D camera calibrationmatrixes calculation (calibrateCamera, calibrationMatrixValues . . . )or, 3D triangulation from different 2D pictures (triangulatePoints).

Once the calibration is done, the key points 1030 identified in the keypoint extraction step are placed on the object 50. This is illustratedon FIG. 3D. The 3D points cloud reconstruction is thereafter made “3Dslice” by “3D slice” in the obtained relative measurement system, atnodes 2.7 and 2.8. Those 3D slices are added together to build thereconstructed 3D points cloud 1050 as seen on FIG. 3E.

A 3D slice comprises the key points identified from the pictures 1000for a specific plane.

The skilled person will understand that this slice by slice 3D cloudreconstruction process could really be compared to the process ofprinting a regular 2D document that is printed line after line whenusing a regular inkjet printer. It is also the exact same process whenprinting a 3D object 50 “slice by slice” while the tray sustaining theprinted object 50 is going down each time the printer buses are passingover the previous slice.

The result of the 3D points cloud reconstruction is a file comprising areconstructed 3D points cloud 1050 in a format understandable to 3Dsoftware. A standard file format is a .ply file, which is a regular fileformat for 3D file. Most 3D software understands and generates thisformat from and to all other 3D formats (obj, stl, 3DS max, ma, mb . . .). The ply format is also very efficiently compressible (nondestructive)and transportable through the network, although it is not really anissue here since the 3D points cloud reconstruction and the 3D pointscloud comparison are both server side computed. Examples of successfullyreconstructed fly files are given in FIG. 5A-5C, showing examples of theobject 50 and associated reconstructed points cloud 1050.

The reconstructed 3D points cloud 1050 is forwarded to the storageserver 30 for a 3D match search. The 3D match search is done with a 3Dpoints cloud comparison made using the ply files. The comparisoncompares the user-generated ply file 1050 with known ply files 1052stored in the 3D database 38. It should be noted that the database plyfiles 1052, associated with each known object stored in the database, isautomatically generated from its 3D model regardless of its originalformat because the ply files can easily and automatically be generatedfrom most regular files formats. It should be noted that the 3D searchmatch process starts as soon as some 3D points are identified. The 3Dsearch match is then enriched with new reconstructed 3D points as longas the recognition process is going on (i.e. no match is found), givingmore and more precision and weight to the 3D part of the recognition.

Two main methods can be used to perform the comparison: 3D geometriccomparison or machine learning. The skilled person is aware that 3Dgeometric comparison is rapidly efficient. Alternative, solutions may bechosen between using existing libraries such as “Points Cloud Libraries”or “Geometry Factory” libraries, which embed root algorithms like pointsource ray projections, principal component analysis in Eigenspaceprojections or local sensitivity hashing. Those libraries and roottechniques can be applied to compare ply files and find a match, butalso to efficiently eliminate non fitting database objects from theidentification process, which is almost as important in the matchingprocess.

Machine learning is also very efficient although it needs a high amountof inputs associated to outputs to give good results. Fortunately, themethod of the present disclosure allows this high amount of data sincedatabase object 50 s contain a 3D representation. It is possible torandomly generate a big amount of ply files of any detail level andmatch them with the known original object 50. This machine learningapproach relies on AI algorithms such as HOG linear (Histogram ofOriented Gradients), or cascade classifier of Haar features. Itcertainly requires an important calculation power since those neuralnetwork based techniques are exponential in terms of calculation, butthis process can be dealt with independently and upstream therecognition process.

The 3D points cloud reconstruction obtained from pictures as shown onFIG. 5 , allows the use of the 3D envelope to do “segmentation” on thereconstructed object. In other words, the 3D object is used in eachpicture that has been part of the 3D reconstruction to isolate theobject in the picture. This is shown on FIG. 8 . A matching 3D objectfrom the 3D database 38 is used to isolate relevant information andobtained a histogram 2010 of the segmented picture. The histogram 2010of the segmented picture can be compared to histograms 2020 of objectsin the database 38 and become a criteria of comparison.

This segmentation offers better performances on matching algorithmsdescribed in this disclosure, as for example in O.C.R. (charactersrecognition)—only relevant characters are kept in the analysis—or incolor analysis, giving much more accurate histograms as described onFIG. 8 . The skilled person will understand that the method forrecognition is an ongoing process. It means that during capture of thepictures data, pictures are sent for computing (node 1.3 & 2.1). Hence,first treatments of first pictures are computed to obtain a computedobject 50 while further pictures data are being acquired for the sameobject 50 to be identified. Indeed, the skilled person will understandthat pictures are taken as long as necessary, meaning as long as theobject 50 has not been identified (although an overall time out can beset, as explained above). Hence, as noted above, the length of theacquisition time is dynamic and may be adapted depending on the 3Dpoints cloud reconstruction made from the dynamic picture set. Thus, ifthe computed points cloud is not sufficient in terms of number ofpoints, the length of the frames acquisition is extended.Gyroscope/accelerometer if available on the device can also be used tofill up empty areas with 2D pictures. For example, it has beenestablished so far that a minimum of 20 pictures is required. Bestresults are obtained if the angle between two pictures is rather small,about 1 degree; thus, 20 to 30 pictures are required for a 20 to 30degrees acquisition. An overall time out can be set to avoid infinitelooping in the process.

In one aspect, pictures regular compression algorithms are used to speedup this step of picture computing. These algorithms are non-destructivein order to optimize the frame by frame treatments. For example,non-destructive image compression is used in images formats such as“png”, “tiff”, “gif”, “jpeg2000”. The pictures regular compression areadapted from open source algorithms, such as entropy coding ordictionary based compression algorithms. This item also includes serverside communications between “cloud server”< >“cloud storage”: node 2.1.

Entropy coding is a lossless data compression method that gives aspecific code to a specific information, this code being easier totransport than the original coding.

For example, let's assume a picture of a car contains 12 M pixels with10 M red pixels, the entropy coding will affect the value “1” to the redcolor instead of the (255,0,0) “usual” color codification. Usual andefficient algorithms that can be easily implemented are “Huffman coding”and, “Shannon-Fano coding”, an optimized version of Huffman coding.

Another compression method could be the Lempel-Ziv-Welch-Algorithm (LZW)algorithm. This method of compression assumes that the item to encode isavailable as a character chain, which is the definition of any digitalsignal. The LZW algorithm encodes sequences of characters by creatingnew characters in a “character dictionary” from read sequences, as seenon the tables of FIG. 9 .

The dictionary starts with 2 characters: 0 et 1. While reading the firstcharacter “1”, it will find the new character “10” made of the 2 firstcharacters of the original chain and will add it to the dictionary(character #2). While reading the second “0”, it will had the newcharacter “00” to the dictionary (character #3). While reading the 3rdcharacter of the chain, it will add to the dictionary “01” (character#4). While reading the 4th character, it will add “11” (character #5) tothe dictionary. The 5th and 6th character are “1” and “1”, which ischaracter #5 of the dictionary. In the meantime, “110” is added to thedictionary as character #6. The compression continues further in thesame manner. In the end, the original chain of 15 items is coded with achain of 8 items.

In one embodiment, server side computing involves many techniquesprocessed simultaneously in order to eliminate non-fitting object fromthe object database 35, 38. Each time a non-fitting object iseliminated, the technique used to eliminate this non fitting object isremembered, thus giving a weight to the efficiency of this technique forthis object 50 to be identified. This weight is then used to prioritizeand speed up the process. The weight is also stored for furtherstatistics. For example, should an object 50 have characters on it, allthe known objects stored in the database without characters areimmediately eliminated; should the red color be identified in an object50, all known objects without red stored in the database would beeliminated.

Another example is the QR-code or Bar-code: should the object 50 haveone of those, the matching would immediately be found and displayed.This specific embodiment is not the purpose of the present disclosurebut is given as an example of the recognition process.

It is important to understand that the present system and method is notmeant to obtain a dense 3D reconstruction of the object 50. However, 3Dpoints cloud reconstruction can be computed with efficiency and accuracyfrom several views of the object 50. This is a tradeoff between accuracyand resources: the more views, the more accuracy in the points cloud butthe more calculation to compute.

Once the object 50 has been identified after the match search in eitherthe first database 35 or the 3D database 38, the information is returnedto the device 10, for display and/or further action on the device 10under at least one of many forms: 3D interactive representationcompatible with all devices, available metadata, 3D printable compatibleexport . . . . This also includes all social networks sharing and usualsearch engines since text metadata is also embedded with the object 50.

The method for recognition is preferably shown in real time to the userthrough a user friendly interface. The main parameter is the number ofobjects still matching from the database. The process ends “OK” whenonly one object 50 is found, “KO” when no match is found or on time outas explained above. Nevertheless, the user can be asked to help thematching process through simple “MCQ” (Multiple Choice Questions)questions to ease the recognition (node 4.2). Those questions/answerscan be very simple: size/dimension, material, brand, family of object 50(food, accessory, car . . . ), accuracy of 2D capture . . . . Thosequestions can be asked according to at least one of the ongoing process,previous decimations in the objects database and remaining objectsmetadata.

FIGS. 10 and 11 are diagrams of methods for recognition in a furtheraspect of the disclosure and FIG. 12 is an overview of the system 201for object recognition of an object 50.

The system 201 comprises a device 210, which is used by an end user. Thedevice 210 preferably comprises a display screen, camera and videocamera, embedded CPU, storage capacities, a connection to a network. Thedevice 210 may have connections to existing platforms 202, such asM-commerce, 3D printing, CRM, social networks.

For example, the device 210 can be a smartphone, a tablet, a laptop withweb cam, a computer, or the like. As will be understood by the skilledperson, smartphones and tablets are most efficient devices for themethod of the present invention.

The device 210 is connected to a cloud server 220. The cloud server 220comprises distant CPUs, GPUs or any so called “virtual machines”facilities that can be provided and useful to the improvement of theperformances of the invention. This includes for example new generationsof processing units dedicated to machine learning, like Google TPUs(Tensor Process Units). The cloud server 220 can be mainly be providedby recognized actors in the domain, such as Azure (Microsoft), AWS(Amazon), Cisco, Google, HP, or more specialized cloud computingprovider, as long as the providers offer efficiency, security and aworldwide presence. In one aspect of the disclosure, power andefficiency of the cloud server 220 can be adapted to the amount ofcalculation to be processed.

The cloud server 220 has a connection to a storage server 230. Thestorage server 230 is a distance storage involving both objects and userdata, as will be explained in the present disclosure. The storage server230 comprises a first database 235 and a second database 238. The firstdatabase 235 comprises stored images in a 2D dimension. The seconddatabase 238 comprises stored 3D files of images.

The method for object recognition will now be described in referencewith the diagram flow of FIG. 11 . The method for object recognitioncomprises the step of acquiring a plurality of pictures 5000 of anobject 50 (node 21.3). In one aspect of the invention, pictures can beacquired or captured by the device 210. Two different acquisition modescan be used: extraction from video or burst mode. In the burst mode,pictures are taken in photographic mode as quick as the device 210allows it. The skilled person will understand that the acquisition ofpictures using extraction from video is more automatic but also morespace and CPU consuming. Pictures from a video sequence shot by thevideo camera of the user's device can be saved every «n» frame, «n»being adjusted dynamically or by the user, representing in some way the«quality» of the 3D scanning. For example, if n<10 frames, the qualityis better but the process is slower. On the other hand, if n>50 frames,the scanning is of lower quality but the process faster. In an aspect ofthe disclosure, «n» is dynamically adjusted, starting with a high value(+/−50 frames) and decreasing as the reconstruction becomes moreaccurate (+/−10 frames).

The burst mode is more “clever” and should require some fine computingto select “proper” frames, i.e. frames that are useful for a moreaccurate 3D “cloud points” reconstruction.

It should be noted that objects should preferably been shot from variousangles. In case of big objects or objects that cannot be turned aroundlike monuments, only specific angles can be used.

The different pictures 5000 for the object 50 represents different viewsfrom the object, from different viewpoints.

The pictures 5000 are acquired until a full set of pictures is acquired.

A set of pictures may be considered completed after a certainacquisition time. For example, an acquisition time estimated to 10-15seconds might be enough. Preferably, an overall time out can be set toavoid infinite looping in the process.

In one aspect of the disclosure, the length of the acquisition time isdynamic and may be adapted depending on a 3D points cloudreconstruction, as will be explained later in this disclosure withreference to nodes 22.3 to 22.9.

The device 210 sends the acquired plurality of pictures 2000 to thecloud server 220 for cloud computing (node 22.1). As will be explainedin the following, a first 2D search match in the databank 35 and/or a 3Dcloud reconstruction followed by a second 3D search match is performed.

2D search can be performed as soon as the pictures are received by thesystem. 3D search can start a soon as reconstructed 3D points areavailable in reasonable quantity. At a point, those 2 types of researchwill be performed simultaneously.

The cloud server 220 forwards the plurality of pictures 5000 to thestorage server 230.

A first search match in the database 235 may be performed in order tomatch the acquired pictures with a known image 5005 stored in the firstdatabase 235 (node 23.1). including every information that can be usefulfor recognition.

The first search match is based on 2D image recognition techniques.Those 2D recognition techniques are implemented in the matchingalgorithm. Different 2 D recognition techniques can be implemented, suchas open source techniques. The 2D recognition technique include at leastone of O.C.R (node 23.1.1), Scale Invariant Feature Transform—SIFT basedimage matching (i.e. automatic recognition of key elements in a picture)(node 23.1.2), color gradient analysis (node 23.1.3) giving a precisecolor map of the object 250. Geolocation information (node 23.1.4) maybe used as well.

In one aspect, various match searches in a objects metadata in thedatabase may be performed. The metadata associated with each object inthe database includes, if relevant and/or available, geolocation, textsthat appear on objects, QR codes or Bar Codes, color histograms of theobject, name/model/description of the object, dimensions of the object,public links, price, serial number, reseller info, related company info. . . .

Each time a non-fitting object stored in the database 235 is eliminatedby either one of these techniques, database indexation is updated inorder to ease the overall process.

Node 23.1.5 describe an original approach referred to as “2D reverseprojections from 3D objects”, explained in FIG. 6 in reference with thefirst embodiment. The 3D stored models of objects or 3D files of imagein the database 235 are provided with complete metadata describing theobject. For example, the metadata comprises the following identificationdata: name, brand, description, size, 2D parameters (colors gradients ormaps, o.c.r. data, histograms, Fourier Transformations, samplings . . .), 3D parameters (points cloud representations, triangulation, textures,materials, size, intrinsic dimensions . . . ). Among these parameters,it is assumed that the 3D representation of the objects generatesnumerous “random” 2D pictures, for an object. A plurality of 2D pictures6000 are generated rendered from the 3D stored model, in order tosimulate as many users' captures as possible. This includes differentrandom lightings, different random points of view, different randomexposures . . . and thus simulate a user's capture. This 2D picturesgeneration sends back, in 2D pictures, comparison through Hausdorffdistance or Kullback-Leibler distance for example.

It should be noted that the 3D models in the database and the automaticgeneration of random pictures can be used for 2D machine learningtraining. Machine learning algorithms need an important amount of datafor training/learning, and 3D models can provide those in quantity &quality. Using random backgrounds for this CGIs can also be implemented,since it is important that the algorithms learn object in theforeground, not objects in the background. Thus, thousands of picturescan easily be generated in various conditions and constitute veryvaluable and accurate inputs for machine learning training.

Therefore, the 2D reverse projections from 3D object is adapted tosimulate the capture of the object 50 by the user and to propose as many“artificial” pictures 2000 as possible to compare them to the pictureset 1000 of the object 50 sent by the user. 2D comparison of theartificial pictures 2000 and of the acquired pictures 1000 is processedalong the other techniques for final matching. Mixing those pictureswith real ones issued from users' capture sessions is also a powerfulway of training 2D machine learning algorithms.

Returning to the step of performing first search and/or metadata matchesin the database 235, if the first search match and/or the search in themetadata is successful, the storage server 230 sends item 52010 ofrelevant information belonging to the known image 5005 back to the cloudserver 220 which forwards said item 5010 to the device 210 for display.

The item 5010 may comprise meta data associated with the object, such asidentification information of the object, a picture of the object, alocalization information of the object, and the like. Other objectsmetadata may comprise, if relevant and/or available, geolocation, textsthat appear on objects, QR codes or Bar Codes, color histograms of theobject, name/model/description of the object, dimensions of the object,public links, price, serial number, reseller info, related company info. . . .

If the first search match is not successful, the storage server 230returns the information to the cloud server 220 that the first searchmatch was not successful (node 23.12). The cloud server 220 starts a 3Dreconstruction process in order to obtain a cloud of 3D points from the2D picture set, followed by a followed by a 3D search match. It is to benoted that the 3D process will also start by itself whenever sufficient3D points clouds can be reconstructed from 2D pictures. A minimum of 5000 vertices in a point cloud should be considered in order to havereliable information regarding to the size of objects.

This 3D reconstruction process, done by cloud server 220, is shown innodes 22.3 to 22.9. As will be detailed in the following, the 3Dreconstruction process includes an identification of the 2D picturesset, a 2D tracking in the pictures, a 3D points set placement and 3Dpoints cloud reconstruction. openMVG libraries may be used or anylibraries known to the skilled person. The method comprises defining aninitial set of searchable points in a first picture of the plurality ofpictures and identifying some of the searchable points in the remainingpictures of the plurality of pictures, to extract the key points andmatch them throughout the whole set of pictures. In addition to this“key point based” algorithms, “pixel by pixel” matching is performedthrough the set of frames (“deep matching”). Resulting of this method, amatching file is generated, giving correspondences between pixels inpairs of pictures. The present invention proposes combining accuratetechniques of deep matching with fast and efficient reconstructionalgorithms based on Structure From Motion researches for theidentification of key points.

The pictures 5000 are processed at node 22.3 to identify the pictures5000 and extract identification information pertaining to the pictureset, such as the acquisition mode, the length of frame acquisition, theidentification of frames for the 3D cloud reconstruction.

Using this identification information, a key point extraction process islaunched on the pictures 5000, in order in order to extract key points5030. Key points are defined as being points that can be correlated inas many pictures as possible.

Known performant algorithms to perform “key points” based 3Dreconstruction are issued from “Structure From Motion” (SFM) techniques.Nevertheless, these algorithms require 2D pictures that must obey tospecific constraints: various textures on objects, lighting parameters,differences in pictures (too much overlapping between picturesintroduces biases in the reconstructed movements of the 3D virtualreconstructed camera, as too many differences will lead to poor 2Dmatching).

As known in the art, SFM requires as inputs full lists of scaleinvariant feature transform (SIFT) describing static and dynamicenvironments of spotted pixels from one frame to another. SFM matchingincludes feature description, since only relevant pixels are tracked andassociated with the corresponding ones in paired pictures. Moreprecisely, pixels are described, among others, by their immediateenvironments, and data are extracted as 128 features describingsurrounding pixels

FIG. 14 shows a first picture (upper) and a second picture with ninepixels and describes the structure of SFM format for the matching filebetween picture for the upper picture. In this example, only onematching pixel between the two pictures is found. Pixel 5 in the upperpicture corresponds to pixel 5 in the second picture. This is this onlymatch, since features including the neighborhood describing for examplepixels numbered 3 and 9 in the second picture are too different fromfeatures describing pixels numbered 3 and 9 in the first picture (upperpicture).

Deep matching, working at the pixel level, gives raw lists of matchingpixels (x_(i,n);y_(i,n)) & (x_(j,p); y_(i,p)) in different frames, wheren and p are the indexes of 2 pixels in two different frames i and j.Deep matching does not include feature description, since every pixel istracked and associated with the corresponding ones in paired pictures.Deep matching generates a matching file between pictures

FIG. 15 describes a first and second pictures with nine pixels each andthe structure of deep matching result. In the example of FIG. 15 , fourmatches are found, in particular pixel numbered 3 in the upper picturematches pixel numbered 2 in the second picture, pixel numbered 5 in inthe upper picture matches pixel numbered 4 in the second picture, pixelnumbered 6 in the upper picture matches pixel numbered in the secondpicture and pixel numbered 9 in the upper picture matches pixel numbered8 in in the second picture.

Therefore, deep matching identifies more matching pixels, which givesmuch more information in the further reconstructed 3D points cloud.However, deep matching does not output directly information usable in 3D reconstruction. More precisely, the outputs of deep matching trackingare not compatible with the inputs of SFM reconstruction and theinventors proposed a bridge between those two approaches to build up afull and efficient pipeline

Therefore, the present invention proposes combing deep matchingtechniques and SFM techniques, to obtain file usable in 3 D recognition.In particular, the present invention proposes transforming deep matchingfiles in order to be compatible with SFM ones. In order to have usable3D reconstruction for the method of recognition according to the presentdisclosure, those two formats have to be fully and automaticallycompatible which is a part of the present invention.

This conversion is shown on FIG. 16 and comprises the following steps:1-Generation of Deep Matching SIFT files and matching file/2- Generationof SFM sift files for the same set of pictures. As explained above withreference to FIGS. 14 and 15 , deep matching identifies more matchingpixels, i.e. extra matching pixels which were not identified by SFM.Therefore, in a third step, SFM SIFT files are augmented with DeepMatching tracked extra matching pixels in order to add more points tothe cloud. After this third step, these added extra matching pixels arenot yet usable by reconstruction algorithms. The last step of theconversion therefore comprises computation of compatible feature for theaugmented file, i.e. identification and conformation of pixels thathaven't been identified by SFM, those pixels being now usable for 3Dreconstruction of corresponding voxels, i.e. for reconstructing the 3Dpoints cloud.

The key points 5030 are identified by the above 2D tracking processthroughout all the pictures 5000 of the set of pictures, in which eachpoint from a picture is identified in other pictures. If the pictureswere acquired though a video, pictures corresponds to frames of thevideo. In other words, an initial set of searchable points 5032 isdefined in a first picture, and the 2D tracking process tries toidentify the searchable points 5032 in the other pictures of the set toextract the key points 5030. This is shown on FIGS. 13B and 13C.

The searchable points are refined throughout the process. Points areadded, other are suppressed. During the key point extraction process,the set of key points is compared to the initial set of searchablepoints. Should the number of key points be too low, other searchablepoints would have to be added to the initial set of searchable points inorder to be tracked again. There is no really minimum in the number ofkey points to be tracked, but the 3D reconstruction and the followingcomparison process is more efficient with dozens of points. It is to benoted that this process is applied to every pixel in each picture.Refining the 3D points clouds is performed by dynamically extending therange of matching pictures. Indeed, pictures are compared pixel by pixelto others that are “close to it” in the input video/frames sequence,i.e. previous or next pictures. Should the model need to be refined dueto a lack of 3D points, pictures will be compared to the “p”preceding/following ones, p being here adjusted dynamically to extendthe range of 2D pictures in which the system searches matching pixels.

The skilled person will further understand that it is important that theobject 50 has to be motionless while being captured, to allow asuccessful key point extraction process.

In order to reconstruct 3D points cloud, the “virtual world” ispreferably calibrated to obtain a relative measurement system Indeed,the system will generally not be able to calculate absolute dimensionsfrom the pictures set unless there is in at least one picture a distancereference, i.e. an object 50 which size/dimensions are known. Most ofthe time, this will not be the case. However, the object 50 will havecoherent dimensions although it will not have the right size andproportions. For example, should the end user scan a mug, the systemwill recognize that the object 50 is a mug but won't be able todetermine if this is a regular mug or the same mug in a giant versionthat could be exposed in front of a store as an advertising totem.Nevertheless, the system will send back an “OK” to the user, consideringthe mug is a regular one that can be found, bought, shared.

The calibration is made using triangulation algorithms. If two or morecameras whose positioning is well known in space see a specific point,triangulation based on elementary trigonometric formulas can determinethe exact position of this specific point in space (i.e. in 3dimensions). In the reverse process, if one tracked point is seen fromdifferent viewpoints (even though these different viewpoints are givenby one single moving device), these different viewpoints can bepositioned in a 3D space relatively to the tracked points, and thus thecaptured object.

The calibration is done at node 22.5, in which a camera calibration isdone using matching 2D points, as explained below.

In order to reconstruct 3D points clouds from sets of pictures, the 2Dpictures should be replaced in a 3D environment, by providing theanswers to the following questions: where in space are the picturestaken from, and where in space are located the 2D tracked points.

The geometrical system at the time of capture can be represented on FIG.6 , with reference to the system of FIG. 2 . The device 10 isrepresented here through its optical center O and his focal plane(“image plane”). The image of the object 50 is made of numerous pointsP(X,Y,Z). The correspondence between the camera “C” and the object “P”is given by the following formula: Pc=CP, where Pc is the projection ofP on the image plane, C the complete camera calibration matrix. Thecalibration matrix C is related to the device 10 and remains the samefor a whole capture session. For example, C can be a 3×4 matrix (12unknowns).

The method for reconstruction is thus to calculate the calibrationmatrix C (calibrating the camera) in a reference frame and then to applythe transformation to other frames in order to position as many P pointsas possible in the space. It should be noted the object P has 3coordinates and is thus positioned in a 3D space.

The calibration matrix C is calculated knowing a few correspondencesbetween 3D points and their 2D projections on the camera image plane. 2Dprojections coordinates are known in the image plane, while 3Dcoordinates are also known in an arbitrary 3D space (i.e. P could beconsidered for example as the center of the 3D world). Pc=CP provides 2equations containing 12 unknowns, meaning that at least 6correspondences must be known in order to solve C. Those correspondencesare determined using fiducial based image processing methods.

Once the calibration matrix C is known, a point Q in space can be foundthrough the reverse equation Q=C−1Qc, where C and Qc are known. Q has 3coordinates that are 3 unknowns. It thus requires another point of viewwith the same camera to solve the system and position Q in the 3D space.

These calculations are made without any indication of the realdimensions in space. The reconstructed objects have the right geometrybut there is no indication about their sizes unless there is in thecamera field of view another object whose dimension is well known. Thisis, however, not prerequisite for the present disclosure. Computingtools on geometry and trigonometry can be found in open sourceslibraries (like openCV or VisualSFM), libraries that are available inopen source since June 2000. Those libraries provide numerous tools ondigital pictures analysis, such as automatic 3D camera calibrationmatrixes calculation (calibrateCamera, calibrationMatrixValues . . . )or, 3D triangulation from different 2D pictures (triangulatePoints).

Once the calibration is done, the key points 5030 identified in the keypoint extraction step are placed on the object 50. This is illustratedon FIG. 13D.

The 3D points cloud reconstruction is thereafter made as an ongoingprocess in which 3D vertices (i.e. 3D points) are added in the obtainedrelative measurement system, at nodes 22.7 and 22.8. Those 3D verticesare added together to build the reconstructed 3D points cloud 5050 asseen on FIG. 13E.

A 3D vertex is the result of the reconstruction allowed from the keypoints 5030 identified from the pictures 5000.

The result of the 3D points cloud reconstruction is a file comprising areconstructed 3D points cloud 5050 in a format understandable to 3Dsoftware.

A standard file format is a .ply file, which is a regular file formatfor 3D file. Most 3D software understands and generates this format fromand to all other 3D formats (obj, stl, 3DS max, ma, mb . . . ). The plyformat is also very efficiently compressible (nondestructive) andtransportable through the network, although it is not really an issuehere since the 3D points cloud reconstruction and the 3D points cloudcomparison are both server side computed.

It should be noted that besides having a metric reference in the scenebeing shot, another way to calculate an absolute metric is to use agyroscope and an accelerometer if the device 210 is equipped with those.With combined data provided by accelerometers/gyroscopes now embedded inmost portable devices, it is possible without any measure information inthe 2D pictures to calculate with a pretty good accuracy the size andmeasures (2D and 3D) of the 3D reconstructed objects. Absolute 2Ddimensions can be provided with a satisfying precision of less than 5%of error, which is usable from the user's point of view. 3D volumes canbe computed with an homemade algorithm with only one 2D measurementinformation (length, height, depth, width . . . ), as detailed withreference to FIG. 17 .

First, a planar surface has to be identified in the 3D reconstructedcloud 5050. It can be on the edge of the surface, or just part of theobject; A virtual grid 5061 is applied on this surface, splitting thegrid into squares, and the whole volume into square basedparallelepipeds. The 3D reconstructed volume is automatically filledwith parallelepipeds 5062 whose base is the grid's unit square and whoseheight is delimited by the points cloud. The volume of eachparallelepiped “p” is l²×h_(p), where, l is the length of the square ofthe parallelepiped's base and the h_(p) height of the parallelepiped“p”, each h_(p) being determined by the surface of the 3D reconstructedpoints cloud “limiting” the parallelepiped. The volume of the wholepoints cloud is the sum of all the parallelepipeds” volumes. Heights aregiven in absolute values so that the calculation is still correct ifvertices are located on either half space delimited by the plan of thegrid.

The parameter that can be adjusted in order to optimizeprecision/computational load is the size of the grid's square: thesmaller it is, the more precise the volume.

It is therefore intended to use 2D or 3D measures (lengths, dimensions,perimeters, surfaces, volumes . . . ) as one of the criteria fordatabase matching. This is very useful to decimate the database,although one may find useful to have a global scale invariantrecognition in order to display every available size of a recognizedmodel. Thus, measurements of scanned object must only be considered as afine tuning parameter to discriminate identical objects from their sizes(shoes, clothes, is this car a model or a real one? . . . ).

Global measurements can be used as a discriminant criteria consideringratios of 3D objects bounding boxes 5065 dimensions as showed on FIG. 18. Ratios like length/height or depth/height of the bounding boxes areone dimensions rations invariant to scale and are the proof of differentobjects if they are different.

The reconstructed 3D points cloud 5050 is forwarded to the storageserver 230 for a 3D match search (node 22.9 and node 23.23). The 3Dmatch search is done with a 3D points cloud comparison made using theply files. The comparison compares the user-generated ply file 5050 withknown ply files 2052 stored in the 3D database 238. It should be notedthat the database ply files 5052, associated with each known objectstored in the database, is automatically generated from its 3D modelregardless of its original format because the ply files can easily andautomatically be generated from most regular files formats. It should benoted that the 3D search match process starts as soon as some 3D pointsare identified. The 3D search match is then enriched with newreconstructed 3D points as long as the recognition process is going on(i.e. no match is found), giving more and more precision and weight tothe 3D part of the recognition.

Two main methods can be used to perform the comparison: 3D geometriccomparison or machine learning. The skilled person is aware that 3Dgeometric comparison is rapidly efficient. Alternative, solutions may bechosen between using existing libraries such as “Points Cloud Libraries”or “Geometry Factory” libraries, which embed root algorithms like pointsource ray projections, principal component analysis in Eigenspaceprojections or local sensitivity hashing. Those libraries and roottechniques can be applied to compare ply files and find a match, butalso to efficiently eliminate non-fitting database objects from theidentification process, which is almost as important in the matchingprocess.

Concerning the “purely geometrical” matching (3D geometric comparison),ICP (Iterative Closest Points) algorithms can easily match a 3Dreconstructed object with one of a database using factors invariant toscale, thus giving orientation and position matching as shown on FIGS.19A to 19B. An alignment step is performed, in which the database modelis aligned with the reconstructed points cloud 5050 intranslation/rotation. This is shown on FIG. 19C.

A denoising of the points clouds is performed as shown in FIG. 19D. Thedenoising step is performed through regular techniques “ConditionalRemoval”, “Radius Outlier Removal (ROR)”, “Statistic Outlier Removal(SOR)”, those being iterative). Segmentation can be done, whereinsegmentation consists in eliminating points that would not be present onall captured pictures and use ongoing 3D reconstruction to “segment” the2D pictures and find some more relevant key points for reconstruction.

However, should segmentation be insufficient or incomplete, as shown onFIG. 20 .A where different objects are still remaining in the 3Dreconstruction, the present invention proposes adding further anadvanced segmentation step which is called “clustering”. The differencebetween segmentation and clustering is mainly that segmentation isprocessed during the reconstruction whereas clustering is applied onreconstructed points clouds. The aim of the clustering is to separate inthe 3d reconstructed point cloud 5050 different reconstructed 3D modelcoming from different objects seen in the set of 2D pictures. Thisallows to separate different 3D objects in different clusters of 3Dpoints and thus to perform matching algorithms on each of them for afull recognition process. For example, FIG. 20 shows two clusters thatare connected together.

This clustering process is described in FIG. 20B. The 3D reconstructedspace resulting of the 3D reconstructed point cloud 5050 is fullysampled with virtual voxels 5095, i.e. 3D cubes, and the number ofreconstructed vertices 5051 contained in each of these cubes is counted.When a cube contain to “few” vertices below of a predeterminedthreshold, those vertices are removed from the reconstructed pointscloud 5050. This allows removing remaining noise and thus to separatefrom each other different groups of vertices 5051 (3D clusters) that canthen be considered as separate 3D models. The resulting separation isshown on FIG. 20C, showing two distinct clusters.

It should be noted that the size of those virtual boxes 5095 and thethreshold under which the number of vertices is considered as “toosmall” are variables of the denoising part of the code can bedynamically adjusted from other parameters such as the number ofvertices of the whole reconstruction or the minimum distance between 2vertices in the reconstructed points cloud.

A scaling step is thereafter performed, illustrated in reference withFIGS. 18 and 19E. Recurrent iterations on bounding boxes 5065 in orderto match those bounding boxes scale. This operation is a very cheapoperation in terms of computational load. Thus, the reconstructed pointcloud 5050 and the database point cloud 6050 will fit in all of the 3features position/rotation/scale and the calculation of the distancebetween the points clouds can be computed. Search for the minimaldistances will give the best match in the database as shown on FIG. 19E.

The above 3D geometric comparison based on geometrical matching is onetool for object recognition.

Machine learning is also very efficient although it needs a high amountof inputs associated to outputs to give good results. Fortunately, themethod of the present disclosure allows this high amount of data sincedatabase object 50 s contain a 3D representation. It is possible torandomly generate a big amount of ply files of any detail level andmatch them with the known original object 50. This machine learningapproach relies on AI algorithms such as HOG linear (Histogram ofOriented Gradients), or cascade classifier of Haar features. Itcertainly requires an important calculation power since those neuralnetwork based techniques are exponential in terms of calculation, butthis process can be dealt with independently and upstream therecognition process.

The Machine Learning/Deep Learning process in the present disclosureboth involves 2D machine learning and 3D machine learning; this lastpoint being performed on the 3D parameters of the reconstructed objectand its subparts, as detailed below.

The invention proposes to identify and treat essential 3D parametersextracted from the 3D adequate reconstruction as key points of anobject, such as peaks, tops, edges, shapes, reliefs, as well as itstexture, colors, materials . . . . More specifically, the first step ofthe machine learning process is to extract 3D descriptors 6012 andgeometrical “primitives” 6013 from 3D reconstructed models of knownobjects 5060 and from the 3D reconstruction point cloud 5050.

Indeed, any 3D object can be derived from 3D simple objects, called“primitives”, such as planes, spheres, cylinders, cones or tori. In areverse process, any 3D object can be separated in a collection of thoseelementary shapes, as showed on FIG. 21 . Those elementary shapes arethen spatially connected to each other through graphs than describetheir spatial connectivity to form the whole object as shown on FIG. 22.

Those geometrical primitives 6013 can be considered as so called “bag offeatures” that can be used for object recognition.

The machine learning model proposed in the present disclosure involves aneural network along with pre-engineering and preprocessing of the data,with the setup of relevant and efficient 3D features vectors as inputsto the neural network. In other words, the invention contains a new wayof describing 3D objects that will be usable for Machine Learning/DeepLearning. For this purpose, instead of learning or matching the wholereconstructed points clouds 5050, 6050, local descriptors 6012, 6013,that are related to the object 50.

Those descriptors 6012, 6013 are related to identified vertices 5051 ofthe plurality of vertices 5051 of the reconstructed 3D point cloud 5050of the object to be identified, and in comparison with the knownreconstructed 3D point cloud 6050 of known object model. The verticesidentified to be relevant for the object recognition are considered intheir immediate neighborhoods: normal analysis, curvatures radiuses,extraction of edges, corners, submits, planes, local surfaces, localsymmetries . . . .

On a further level, the method also uses 3D primitives as descriptors6012, 6013 since any 3D object can be split in sub objects that arerelated together as shown on FIGS. 21 and 22 . Such local descriptors orprimitives are much more convenient to describe than the whole 3D model.For example, 2D normal vectors are used which can be encoded into 3Dfeatures vectors both for the database (training/learning) and for thereconstructed objects (matching/recognition). An example of normalvectors organization is given in FIG. 22 .

Hence, any local descriptors can be analyzed with fewer parameters thanthe whole 3D objects. The 3D features vectors used as inputs in theneural network is built according to this structure, i.e. matricesformed by local simplified information linked together with graphsdescribing their spatial connectivity.

The invention also includes the automation of all the process, enablingthe solution to consider “online learning”, meaning use of the 3D dataacquired by users to augment the reference database. Indeed, thealgorithms developed can convert any 3D objects into 3D feature vectorsthat will themselves be part of the training/learning process.

It is to be noted that the recognition algorithms developed for thisinvention can also be used on the full 3D reconstructed objects, butthat full analysis is more resource and time consuming and give poorresults compared to their use on smaller and more identified objects.The combination of matching small objects (primitives) with theirconnectivity graphs is the key of an efficient matching.

The 3D points cloud reconstruction obtained from pictures as shown onFIG. 5 or on FIG. 13 , allows the use of the 3D envelope to do“segmentation” on the reconstructed object, as mentioned earlier in thedisclosure. In other words, the 3D object is used in each picture thathas been part of the 3D reconstruction to isolate the object in thepicture. This is shown on FIG. 7 . A matching 3D object from the 3Ddatabase 38 is used to isolate relevant information and obtained ahistogram 2010 of the segmented picture. This is the segmentation.Segmentation is used in addition to or in combination with furtherclustering of 3D objects in the reconstructed points cloud in case thepoints cloud would contain several objects (which the clustering stepwill reveal). The histogram 2010 of the segmented picture can becompared to histograms 2020 of objects in the database 38 and become acriteria of comparison.

This segmentation offers better performances on matching algorithmsdescribed in this disclosure, as for example in O.C.R. (charactersrecognition)—only relevant characters are kept in the analysis—or incolor analysis, giving much more accurate histograms as described onFIG. 8 . The skilled person will understand that the method forrecognition is an ongoing process. It means that during capture of thepictures data, pictures are sent for computing (node 1.3 & 2.1, or node21.3 & 22.1). Hence, first treatments of first pictures are computed toobtain a computed object 50 while further pictures data are beingacquired for the same object 50 to be identified. Indeed, the skilledperson will understand that pictures are taken as long as necessary,meaning as long as the object 50 has not been identified (although anoverall time out can be set, as explained above). Hence, as noted above,the length of the acquisition time is dynamic and may be adapteddepending on the 3D points cloud reconstruction made from the dynamicpicture set. Thus, if the computed points cloud is not sufficient interms of number of points, the length of the frames acquisition isextended. Gyroscope/accelerometer if available on the device can also beused to fill up empty areas with 2D pictures. For example, it has beenestablished so far that a minimum of 20 pictures is required. Bestresults are obtained if the angle between two pictures is rather small,about 1 degree; thus, 20 to 30 pictures are required for a 20 to 30degrees acquisition. An overall time out can be set to avoid infinitelooping in the process. An important point to keep in mind in therecognition is the “decimation of the database”, i.e. the elimination ofevery non-matching object as soon as a criteria can allow this. This“decimation” process comes with lowering the weights that are assignedto every potential solution while the process is ongoing as explained in[0081] below.

In one aspect, pictures regular compression algorithms are used to speedup this step of picture computing. These algorithms are non-destructivein order to optimize the frame by frame treatments. For example,non-destructive image compression is used in images formats such as“png”, “tiff”, “gif”, “jpeg2000”. The pictures regular compression areadapted from open source algorithms, such as entropy coding ordictionary based compression algorithms. This item also includes serverside communications between “cloud server”< >“cloud storage”: node 2.1.

Entropy coding is a lossless data compression method that gives aspecific code to a specific information, this code being easier totransport than the original coding.

For example, let's assume a picture of a car contains 12 M pixels with10 M red pixels, the entropy coding will affect the value “1” to the redcolor instead of the (255,0,0) “usual” color codification. Usual andefficient algorithms that can be easily implemented are “Huffman coding”and, “Shannon-Fano coding”, an optimized version of Huffman coding.

Another compression method could be the Lempel-Ziv-Welch-Algorithm (LZW)algorithm. This method of compression assumes that the item to encode isavailable as a character chain, which is the definition of any digitalsignal. The LZW algorithm encodes sequences of characters by creatingnew characters in a “character dictionary” from read sequences, as seenon the tables of FIG. 9 .

The dictionary starts with 2 characters: 0 et 1. While reading the firstcharacter “1”, it will find the new character “10” made of the 2 firstcharacters of the original chain and will add it to the dictionary(character #2). While reading the second “0”, it will had the newcharacter “00” to the dictionary (character #3). While reading the 3rdcharacter of the chain, it will add to the dictionary “01” (character#4). While reading the 4th character, it will add “11” (character #5) tothe dictionary. The 5th and 6th character are “1” and “1”, which ischaracter #5 of the dictionary. In the meantime, “110” is added to thedictionary as character #6. The compression continues further in thesame manner. In the end, the original chain of 15 items is coded with achain of 8 items.

In one embodiment, server side computing involves many techniquesprocessed simultaneously in order to eliminate non-fitting object fromthe object database 35, 38. Each time a non-fitting object iseliminated, the technique used to eliminate this non fitting object isremembered, thus giving a weight to the efficiency of this technique forthis object 50 to be identified. This weight is then used to prioritizeand speed up the process. The weight is also stored for furtherstatistics. For example, should an object 50 have characters on it, allthe known objects stored in the database without characters areimmediately eliminated; should the red color be identified in an object50, all known objects without red stored in the database would beeliminated.

Another example is the QR-code or Bar-code: should the object 50 haveone of those, the matching would immediately be found and displayed.This specific embodiment is not the purpose of the present disclosurebut is given as an example of the recognition process.

It is important to understand that the present system and method is notmeant to obtain a dense 3D reconstruction of the object 50. However, 3Dpoints cloud reconstruction can be computed with efficiency and accuracyfrom several views of the object 50. This is a tradeoff between accuracyand resources: the more views, the more accuracy in the points cloud butthe more calculation to compute.

Once the object 50 has been identified after the match search in eitherthe first database 35 or the 3D database 38, the information is returnedto the device 10, for display and/or further action on the device 10under at least one of many forms: 3D interactive representationcompatible with all devices, available metadata, 3D printable compatibleexport . . . . This also includes all social networks sharing and usualsearch engines since text metadata is also embedded with the object 50.It is to be noticed that although it is not part of our invention, thedisplaying of objects will stay close to the technological evolution andall modern techniques known or to come in these domains. As an example,developments in such fields as Augmented Reality (A.R.) or VirtualReality (V.R.) are very popular and are to be takin in consideration.

The method for recognition is preferably shown in real time to the userthrough a user friendly interface. The main parameter is the number ofobjects still matching from the database. The process ends “OK” whenonly one object 50 is found, “KO” when no match is found or on time outas explained above. Nevertheless, the user can be asked to help thematching process through simple “MCQ” (Multiple Choice Questions)questions to ease the recognition (node 4.2). Those questions/answerscan be very simple: size/dimension, material, brand, family of object 50(food, accessory, car . . . ), accuracy of 2D capture . . . . Thosequestions can be asked according to at least one of the ongoing process,previous decimations in the objects database and remaining objectsmetadata. In all of this cases, matching models come with a weight thatfigures its “score” in the matching process and the “best” scores can bedisplayed in a friendly user interface that will allow a choice. It ispredictable that objects will sometimes be close, like a specific mug orshoe will be close to another mug or shoe. “Close results” will then bedisplayed, giving useful information to the user.

The foregoing description of the preferred embodiments of the inventionhas been presented for purposes of illustration and description. It isnot intended to be exhaustive or to limit the invention to the preciseform disclosed, and modifications and variations are possible in lightof the above teachings or may be acquired from practice of theinvention. The embodiment was chosen and described in order to explainthe principles of the invention and its practical application to enableone skilled in the art to utilize the invention in various embodimentsas are suited to the particular use contemplated. It is intended thatthe scope of the invention be defined by the claims appended hereto, andtheir equivalents.

1-16. (canceled)
 17. A computer-implemented method of objectrecognition, the method comprising: receiving a plurality of pictures ofan object; reconstructing, from the plurality of pictures, a 3D pointcloud of the object, wherein the 3D point cloud comprises a plurality ofvertices in a 3D space; extracting, from the 3D point cloud, a pluralityof 3D descriptors, wherein each descriptor comprises information about:at least one vertex of the plurality of vertices, and a 3D primitive ofthe 3D point cloud; encoding the plurality of 3D descriptors into one ormore 3D feature vectors; performing a recognition process, based on the3D feature vectors, to identify the object; and outputting an identifierof the object.
 18. The method of claim 17, wherein performing therecognition process comprises inputting the 3D feature vectors to amachine learning algorithm.
 19. The method of claim 17, whereinperforming the recognition process comprises executing a 3D geometriccomparison, execution of the 3D geometric comparison comprisingcomparing the 3D point cloud with 3D models of known objects based onthe one or more 3D feature vectors.
 20. The method of claim 19, whereinexecuting the 3D geometric comparison comprises executing a segmentationand a clustering, wherein the segmentation is performed during the stepof reconstructing the 3D point cloud, and wherein the clustering isexecuted on the 3D point cloud.
 21. The method of claim 20, wherein theclustering separates multiple objects visible in the plurality ofpictures.
 22. The method of claim 17, wherein each 3D primitive isrepresentative of a geometrical shape, and wherein the geometrical shapeis a plane, sphere, cylinder, cube, or torus.
 23. The method of claim17, further comprising performing a 2D match search in a 2D picturesdatabase, the 2D pictures database comprising 2D pictures randomlygenerated from 3D models of known objects.
 24. The method of claim 23,wherein the 2D match search is performed before the recognition process,and wherein the recognition process is performed after the 2D matchsearch fails to find a match.
 25. The method of claim 23, wherein the 2Dpictures are randomly generated by performing at least one of generatingdifferent random lightings, generating different random points of view,or generating different random exposures.
 26. The method of claim 17,wherein reconstructing the 3D point cloud comprises: extracting aplurality of key points from the plurality of pictures; defining theplurality of vertices, wherein a vertex of the plurality of verticescorresponds in 3D to a key point of the plurality of key points; andderiving the 3D point cloud based on the plurality of vertices.
 27. Themethod of claim 17, wherein reconstructing the 3D point cloud comprises:extracting a plurality of key points from the plurality of pictures;defining a plurality of 3D slices of the object, wherein each 3D sliceof the plurality of 3D slices comprises at least one key point of theplurality of key points; and constructing, based on the 3D slices, the3D point cloud.
 28. A system comprising at least one processor andmemory storing a plurality of executable instructions which, whenexecuted by the at least one processor of the system, cause the systemto: receive a plurality of pictures of an object; reconstruct, from theplurality of pictures, a 3D point cloud of the object, wherein the 3Dpoint cloud comprises a plurality of vertices in a 3D space; extract,from the 3D point cloud, a plurality of 3D descriptors, wherein eachdescriptor comprises information about: at least one vertex of theplurality of vertices, and a 3D primitive of the 3D point cloud; encodethe plurality of 3D descriptors into one or more 3D feature vectors;perform a recognition process, based on the 3D feature vectors, toidentify the object; and output an identifier of the object.
 29. Thesystem of claim 28, wherein the instructions that cause the system toperform the recognition process comprise instructions that cause thesystem to input the 3D feature vectors to a machine learning algorithm.30. The system of claim 28, wherein the instructions that cause thesystem to perform the recognition process comprise instructions thatcause the system to execute a 3D geometric comparison by comparing the3D point cloud with 3D models of known objects based on the one or more3D feature vectors.
 31. The system of claim 28, wherein each 3Dprimitive is representative of a geometrical shape, and wherein thegeometrical shape is a plane, sphere, cylinder, cube, or torus.
 32. Anon-transitory computer-readable medium containing instructions which,when executed by a processor, cause the processor to: receive aplurality of pictures of an object; reconstruct, from the plurality ofpictures, a 3D point cloud of the object, wherein the 3D point cloudcomprises a plurality of vertices in a 3D space; extract, from the 3Dpoint cloud, a plurality of 3D descriptors, wherein each descriptorcomprises information about: at least one vertex of the plurality ofvertices, and a 3D primitive of the 3D point cloud; encode the pluralityof 3D descriptors into one or more 3D feature vectors; perform arecognition process, based on the 3D feature vectors, to identify theobject; and output an identifier of the object.
 33. The non-transitorycomputer-readable medium of claim 32, wherein the instructions, whenexecuted by the processor, cause the processor to perform a 2D matchsearch in a 2D pictures database, the 2D pictures database comprising 2Dpictures randomly generated from 3D models of known objects.
 34. Thenon-transitory computer-readable medium of claim 32, wherein theinstructions that cause the processor to reconstruct the 3D point cloudcomprise instructions that cause the processor to: extract a pluralityof key points from the plurality of pictures; define the plurality ofvertices, wherein a vertex of the plurality of vertices corresponds in3D to a key point of the plurality of key points; and derive the 3Dpoint cloud based on the 3D vertices.
 35. The non-transitorycomputer-readable medium of claim 32, wherein the instructions thatcause the processor to reconstruct the 3D point cloud compriseinstructions that cause the processor to: extract a plurality of keypoints from the plurality of pictures; define a plurality of 3D slicesof the object, wherein each 3D slice of the plurality of 3D slicescomprises at least one key point of the plurality of key points; andconstruct, based on the 3D slices, the 3D point cloud.
 36. Thenon-transitory computer-readable medium of claim 32, wherein theinstructions that cause the processor to perform the recognition processcomprise instructions that cause the processor to input the 3D featurevectors to a machine learning algorithm.